CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

نویسندگان

  • Yuxin Peng
  • Jinwei Qi
  • Yuxin Yuan
چکیده

It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap, which makes it very challenging to correlate such heterogeneous data. Recently, generative adversarial networks (GANs) have been proposed and shown its strong ability of modeling data distribution and learning discriminative representation, and most of the existing GANs-based works mainly focus on the unidirectional generative problem to generate new data such as image synthesis. While we have completely different goal, which aims to effectively correlate existing largescale heterogeneous data of different modalities, by utilizing the power of GANs to model the cross-modal joint distribution. Thus, in this paper we propose Cross-modal Generative Adversarial Networks (CM-GANs) to learn discriminative common representation for bridging the heterogeneity gap. The main contributions can be summarized as follows: (1) Cross-modal GANs architecture is proposed to model the joint distribution over the data of different modalities. The inter-modality and intramodality correlation can be explored simultaneously in generative and discriminative models. Both of them beat each other to promote cross-modal correlation learning. (2) Cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form the generative model. They can not only exploit the cross-modal correlation for learning the common representation, but also preserve the reconstruction information for capturing the semantic consistency within each modality. (3) Cross-modal adversarial mechanism is proposed, which utilizes two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make the generated common representation more discriminative by adversarial training process. To the best of our knowledge, our proposed CM-GANs approach is the first to utilize GANs to perform cross-modal common representation learning, by which the heterogeneous data can be effectively correlated. Extensive experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval paradigm, compared with 10 state-of-the-art methods on 3 cross-modal datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SyncGAN: Synchronize the Latent Space of Cross-modal Generative Adversarial Networks

Generative adversarial network (GAN) has achieved impressive success on cross-domain generation, but it faces difficulty in cross-modal generation due to the lack of a common distribution between heterogeneous data. Most existing methods of conditional based cross-modal GANs adopt the strategy of one-directional transfer and have achieved preliminary success on text-to-image transfer. Instead o...

متن کامل

Adversarial Feature Learning

The ability of the Generative Adversarial Networks (GANs) framework to learn generative models mapping from simple latent distributions to arbitrarily complex data distributions has been demonstrated empirically, with compelling results showing generators learn to “linearize semantics” in the latent space of such models. Intuitively, such latent spaces may serve as useful feature representation...

متن کامل

Metric Learning-based Generative Adversarial Network

Generative Adversarial Networks (GANs), as a framework for estimating generative models via an adversarial process, have attracted huge attention and have proven to be powerful in a variety of tasks. However, training GANs is well known for being delicate and unstable, partially caused by its sigmoid cross entropy loss function for the discriminator. To overcome such a problem, many researchers...

متن کامل

IVE-GAN: Invariant Encoding Generative Adversarial Networks

Generative adversarial networks (GANs) are a powerful framework for generative tasks. However, they are difficult to train and tend to miss modes of the true data generation process. Although GANs can learn a rich representation of the covered modes of the data in their latent space, the framework misses an inverse mapping from data to this latent space. We propose Invariant Encoding Generative...

متن کامل

Logo Synthesis and Manipulation with Clustered Generative Adversarial Networks

Designing a logo for a new brand is a lengthy and tedious back-and-forth process between a designer and a client. In this paper we explore to what extent machine learning can solve the creative task of the designer. For this, we build a dataset – LLD – of 600k+ logos crawled from the world wide web. Training Generative Adversarial Networks (GANs) for logo synthesis on such multi-modal data is n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.05106  شماره 

صفحات  -

تاریخ انتشار 2017